{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Julia很快\n",
    "\n",
    "基准测试程序数值（benchmarks）通常被用来比较编程语言的性能。这些benchmarks可能引起很长的讨论，首先是作为基准测试的到底是什么，再有就是究竟是什么导致不同语言间的差异。这些简单的问题有时比你乍想之下复杂得多。\n",
    "\n",
    "这个notebook目的在于让你实现一个简单的基准测试。通过这个notebook可以看到作者的配有四核Macbook Pro表现如何，或者你可以自己跑跑看。\n",
    "\n",
    "(材料来自于MIT的Steven Johnson的一场演讲: https://github.com/stevengj/18S096/blob/master/lectures/lecture1/Boxes-and-registers.ipynb.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 大纲\n",
    "\n",
    "- 定义sum函数\n",
    "- 实现并测试sum函数在...\n",
    "    - C (自定义)\n",
    "    - C (使用-ffast-math的自定义)\n",
    "    - python (内置)\n",
    "    - python (numpy)\n",
    "    - python (自定义)\n",
    "    - Julia (内置)\n",
    "    - Julia (自定义)\n",
    "    - Julia (使用SIMD的自定义)\n",
    "- benchmarks总结"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# `sum`: 一个很容易理解的函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "考虑**求和**函数`sum(a)`，它是用来计算\n",
    "$$\n",
    "\\mathrm{sum}(a) = \\sum_{i=1}^n a_i,\n",
    "$$\n",
    "其中$n$是`a`的长度。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "a = rand(10^7)  # 在[0,1)上均匀分布的随机数组成的1维向量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "预期结果应该在0.5 * 10^7左右，因为所有元素的平均值应该是0.5左右"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 几种语言下的几种基准测试方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@time sum(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@time sum(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@time sum(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`@time`宏会产生受干扰的结果，所以不是做基准测试的最佳选择！\n",
    "\n",
    "好在Julia有个`BenchmarkTools.jl`包来简单并准确地进行基准测试："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using Pkg\n",
    "Pkg.add(\"BenchmarkTools\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "using BenchmarkTools"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  1. C语言\n",
    "\n",
    "C语言对人类不友好但是对机器友好，经常作为一个黄金标准。在C语言的两倍以内通常来说就比较理想了。然而，即使仅考虑C语言，不同的C语言程序员也可能采用或不采用不同的优化方式。\n",
    "\n",
    "原作者不会C语言，所以他看不懂下面这个cell中的代码，但是他知道Julia回话中可以编译、运行C语言代码。注意`\"\"\"`所包围的是个多行字符串。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "using Libdl\n",
    "C_code = \"\"\"\n",
    "#include <stddef.h>\n",
    "double c_sum(size_t n, double *X) {\n",
    "    double s = 0.0;\n",
    "    for (size_t i = 0; i < n; ++i) {\n",
    "        s += X[i];\n",
    "    }\n",
    "    return s;\n",
    "}\n",
    "\"\"\"\n",
    "\n",
    "const Clib = tempname()  # 创建一个临时文件\n",
    "\n",
    "\n",
    "# 通过将C_code导入gcc来编译成共享链接库\n",
    "# (只有装了gcc才行):\n",
    "\n",
    "open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * \".\" * Libdl.dlext) -`, \"w\") do f\n",
    "    print(f, C_code) \n",
    "end\n",
    "\n",
    "# 定义一个调用这个C语言函数的Julia函数：\n",
    "c_sum(X::Array{Float64}) = ccall((\"c_sum\", Clib), Float64, (Csize_t, Ptr{Float64}), length(X), X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "c_sum(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "c_sum(a) ≈ sum(a)  # 输入\\approx后按<TAB>键得到≈符号"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c_sum(a) - sum(a)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "≈  # `isapprox` 函数的别名"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "?isapprox"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在我们可以在Julia中直接对C语言代码进行基准测试了："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "c_bench = @benchmark c_sum($a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "println(\"C: 最快$(minimum(c_bench.times) / 1e6) 毫秒\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "d = Dict()  # 字典，也就是关联数组（键值对）\n",
    "d[\"C\"] = minimum(c_bench.times) / 1e6  # 毫秒\n",
    "d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "using Plots\n",
    "gr()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using Statistics  # 导入统计包以计算标准差\n",
    "t = c_bench.times / 1e6  # 毫秒\n",
    "m, σ = minimum(t), std(t)\n",
    "\n",
    "histogram(t, bins=500,\n",
    "    xlim=(m - 0.01, m + σ),\n",
    "    xlabel=\"milliseconds\", ylabel=\"count\", label=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. 使用-ffast-math的C语言\n",
    "\n",
    "如果我们允许C重新安排浮点操作，它就会使用SIMD指令(single instruction, multiple data)进行矢量化。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "const Clib_fastmath = tempname()   # 创建临时文件\n",
    "\n",
    "# 同上但是增加了-ffast-math标志\n",
    "open(`gcc -fPIC -O3 -msse3 -xc -shared -ffast-math -o $(Clib_fastmath * \".\" * Libdl.dlext) -`, \"w\") do f\n",
    "    print(f, C_code) \n",
    "end\n",
    "\n",
    "# 定义一个调用这个C语言函数的Julia函数：\n",
    "c_sum_fastmath(X::Array{Float64}) = ccall((\"c_sum\", Clib_fastmath), Float64, (Csize_t, Ptr{Float64}), length(X), X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c_fastmath_bench = @benchmark $c_sum_fastmath($a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d[\"C -ffast-math\"] = minimum(c_fastmath_bench.times) / 1e6  # 毫秒"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Python的内置`sum` "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`PyCall`包给Julia提供了Python接口："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "using Pkg; Pkg.add(\"PyCall\")\n",
    "using PyCall"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# 获得Python的内置\"sum\"函数：\n",
    "pysum = pybuiltin(\"sum\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "pysum(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "pysum(a) ≈ sum(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "py_list_bench = @benchmark $pysum($a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "d[\"Python built-in\"] = minimum(py_list_bench.times) / 1e6\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Python: `numpy` \n",
    "\n",
    "## 条件允许的情况下利用硬件\"SIMD\"\n",
    "\n",
    "`numpy`是一个Python的优化过的C语言库。\n",
    "在Julia版Python中的安装方法为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "using Pkg; Pkg.add(\"Conda\")\n",
    "using Conda"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Conda.add(\"numpy\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "numpy_sum = pyimport(\"numpy\")[\"sum\"]\n",
    "\n",
    "py_numpy_bench = @benchmark $numpy_sum($a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "numpy_sum(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "numpy_sum(a) ≈ sum(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "d[\"Python numpy\"] = minimum(py_numpy_bench.times) / 1e6\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5. Python（自定义）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "py\"\"\"\n",
    "def py_sum(A):\n",
    "    s = 0.0\n",
    "    for a in A:\n",
    "        s += a\n",
    "    return s\n",
    "\"\"\"\n",
    "\n",
    "sum_py = py\"py_sum\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "py_hand = @benchmark $sum_py($a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "sum_py(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "sum_py(a) ≈ sum(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "d[\"Python hand-written\"] = minimum(py_hand.times) / 1e6\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6. Julia (内置) \n",
    "\n",
    "## 是直接用Julia写的，不是C！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "@which sum(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "j_bench = @benchmark sum($a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "d[\"Julia built-in\"] = minimum(j_bench.times) / 1e6\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 7. Julia (自定义) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "function mysum(A)   \n",
    "    s = 0.0  # s = zero(eltype(a))\n",
    "    for a in A\n",
    "        s += a\n",
    "    end\n",
    "    s\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "j_bench_hand = @benchmark mysum($a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "d[\"Julia hand-written\"] = minimum(j_bench_hand.times) / 1e6\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 8. Julia (使用simd的自定义) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "function mysum_simd(A)   \n",
    "    s = 0.0  # s = zero(eltype(A))\n",
    "    @simd for a in A\n",
    "        s += a\n",
    "    end\n",
    "    s\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "j_bench_hand_simd = @benchmark mysum_simd($a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mysum_simd(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "d[\"Julia hand-written simd\"] = minimum(j_bench_hand_simd.times) / 1e6\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 总结"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "for (key, value) in sort(collect(d), by=last)\n",
    "    println(rpad(key, 25, \".\"), lpad(round(value; digits=1), 6, \".\"))\n",
    "end"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Julia 1.0.5",
   "language": "julia",
   "name": "julia-1.0"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.0.5"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "212px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": "2",
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
